Loading R Packages

First, load these packages, since all of them are required to execute further steps. If you can’t load them / are missing one, please use install.packages() to re install or download them, and then run the code. Note that Seurat requires specific versions to work correctly; we typically use v.4.4.0, but the latest version (5.1.0) could also work.

suppressMessages(library(mclust))
suppressMessages(library(MatrixExtra))
suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(sctransform))
suppressMessages(library(future))
suppressMessages(library(Seurat))
suppressMessages(library(cowplot))
suppressMessages(library(stringr))
suppressMessages(library(patchwork))
suppressMessages(library(Matrix))
suppressMessages(library(Scillus))
suppressMessages(library(MAST))
suppressMessages(library(tidyr))
suppressMessages(library(ggthemes))
suppressMessages(library(magrittr))
suppressMessages(library(SeuratData))
suppressMessages(library(SeuratObject))
suppressMessages(library(devtools))
suppressMessages(library(htmltools))
suppressMessages(library(ScaledMatrix))
suppressMessages(library(sparseMatrixStats))
suppressMessages(library(DelayedMatrixStats))
suppressMessages(library(BPCells))
suppressMessages(library(tidyr))

Loading Seurat Objects

Load the seurat object that contains the single cell data obtained from samples. A seurat object is a dataset that contains the original data of the donors, along with cell types identified, diabetic status, etc. Load the metadata to look at all information contained in this seurat object in various categories. In this case, reharm_SCT_2024Jun18_small is a downsampled version of a processed seurat object containing human pancreatic islet sample data.

reharm_SCT_2024Jun18_small <- readRDS("~/AS/objects/reharm_SCT_2024Jun18_small.rds")

reharm_SCT_2024Jun18_small
An object of class Seurat 
68799 features across 10000 samples within 3 assays 
Active assay: RNA (36601 features, 3000 variable features)
 3 layers present: counts, data, scale.data
 2 other assays present: Protein, SCT
 4 dimensional reductions calculated: pca, harmony, umap, DecontX_NoGroup_umap
reharm_SCT_2024Jun18_small@meta.data

Viewing Cell Types in Seurat Objects

We are interested in viewing different cell types, so we use Idents() and levels(). This shows us different cell types associated with endocrine and inflammatory processes such as alpha, beta, delta-gamma, macrophages, mast cells, etc.

Idents(reharm_SCT_2024Jun18_small) <- "cell_type"
levels(reharm_SCT_2024Jun18_small)
 [1] "Alpha"                      "Alpha_Prolif"               "Beta"                       "Endocrine_MAP2"            
 [5] "Dedifferentiated_Endocrine" "Delta_Gamma"                "Macrophages"                "Mast"                      
 [9] "Lymphocytes"                "B_Cells"                    "Myofibroblasts"             "Pericytes"                 
[13] "Endothelial"                "Schwann"                    "Ductal"                     "Acinar"                    
[17] "LowQuality_Endocrine"       "LowQuality_Endothelial"     "LowQuality_Exo"             "LowQuality"                

Deleting Cell Types Not Relevant to Seurat Objects

We view the cell types present in our large seurat object. We see that there are cell types associated with low quality, and it would be beneficial to the dataset if we removed them. To do this, we follow the code below.

reharm_SCT_2024Jun18_small <- subset(reharm_SCT_2024Jun18_small, idents = c("LowQuality_Endocrine", "LowQuality_Endothelial", "LowQuality_Exo", "LowQuality"), invert = TRUE)

Visualizing Cell Types Using Dimensional Plots.

Now that we have removed low quality cell types, we re-construct a DimPlot with an updated registry of cell types for our reference.

DimPlot(reharm_SCT_2024Jun18_small)+ggtitle("ALL CELL TYPES")

Subsetting Cells of Interest from Seurat Object

Now, we need to make a subset from this data that contains only mast cells, because we are interested in characterizing mast cells in diabetic patients. This generates a new seurat object called mast_cells_cluster. If you want to subset some other type of cells, use the same syntax but edit to include your cell type instead.

mast_cells_cluster <- subset(reharm_SCT_2024Jun18_small, idents = "Mast")
levels(mast_cells_cluster)
[1] "Mast"

Seurat Clusters and How to View Them

Within mast cells, we need to view the different clusters that have been generated under this seurat object. Seurat clusters are usually named 1, 2, 3, etc. that are used to cluster the cells together for more easier organization.

Idents(mast_cells_cluster) <- "seurat_clusters"
levels(mast_cells_cluster)
 [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

Single Cell RNA Sequencing of Seurat Clusters

Now, it’s time to prepare this data for scRNA seq analysis. This is done through running a PCA analysis and then running a UMAP analysis to obtain a UMAP plot (can also use TSNE, as they both test linear dimensionality to produce a DimPlot). Then, we use FindNeighbors to compute a shared nearest neighbors (SNN) graph, which helps in clustering.

mast_cells_cluster <- RunPCA(mast_cells_cluster, dims = 1:30, verbose = FALSE) %>%
  RunTSNE(dims = 1:30, verbose = FALSE) %>%
  FindNeighbors(verbose = TRUE) %>%
  FindClusters(resolution = 0.1, verbose = TRUE)
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 2266
Number of edges: 79677

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9346
Number of communities: 3
Elapsed time: 0 seconds

Saving the Analyzed Cells as a new Seurat Object

We have now obtained a seurat object with our required scRNA seq parameters. Save this object using saveRDS().

saveRDS(mast_cells_cluster, "mast_cells_cluster.rds")

Visualizing This Object as a TSNE Dimensional Plot

Here’s a TSNE plot to exhibit our clustering results:

TSNEPlot(mast_cells_cluster, group.by = "seurat_clusters")

And a TSNE plot to show the same, grouped by heterogenous donors, and conditions:

TSNEPlot(mast_cells_cluster, group.by = "Donor")

TSNEPlot(mast_cells_cluster, group.by = "Condition")

Finding Markers for Differential Expressed Genes

Now we need to find marker genes for this processed mast cell cluster object under the 3 clusters identified. Obtaining these marker genes will help us downstream in seeing which genes are expressed when and why. They are called differential expressed genes (DEG).

Idents(mast_cells_cluster) <- "seurat_clusters"
mast_cell_markers <-  Seurat::FindAllMarkers(mast_cells_cluster, only.pos = FALSE, test.use = "roc")

Save markers as table.

write.csv(mast_cell_markers, "mast_cell_markers_clean.csv")
head(mast_cell_markers_clean)

Classifying DEG Markers Based on Diabetic Conditions:

Change Idents() to look at differential expressed genes by condition, because we want to view the marker genes we obtained before based on Lean condition (which is non-diabetic), and T2D condition. We ignore Obese group because it contains both diabetic and non-diabetic samples, which has shown to produce discrepancy in results.

Idents(mast_cells_cluster) <- "Condition"
levels(mast_cells_cluster)
[1] "Lean"   "Obese"  "PreT2D" "T2D"   

Visualizing Such Markers through TSNE Dimensional Plots

Let’s now view the analysis we’ve done using TSNEPlot. We first obtain a general plot of all conditions for reference. You can also do this using UMAPPlot or DimPlot.

TSNEPlot(mast_cells_cluster,group.by = "Condition")

Subsetting Markers into Diabetic Conditions of Interest

We further subset the seurat object mast_cells_cluster based on the conditions we want, which are lean and T2D. We then reprocess condition_subset into clusters for accurate visualization.

mast_cells_cluster@graphs <- list()
condition_subset <- subset(mast_cells_cluster, idents = c("Lean", "T2D"))
condition_subset <- RunPCA(condition_subset, dims = 1:30, verbose = FALSE) %>%
  RunTSNE(dims = 1:30, verbose = FALSE) %>%
  FindNeighbors(verbose = TRUE) %>%
  FindClusters(resolution = 0.1, verbose = TRUE)
Computing nearest neighbor graph
Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 1410
Number of edges: 48291

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9153
Number of communities: 3
Elapsed time: 0 seconds
TSNEPlot(condition_subset)+ggtitle("Lean and T2D Conditions")

Finding DEGs Based on these Specific Diabetic Conditions

Find DEGs between Lean and T2D conditions.

condition_subset <- Seurat::FindAllMarkers(condition_subset, only.pos = FALSE, test.use = "roc")

This can be saved both as a CSV for later use on Excel.

write.csv(condition_subset, "condition_subset.csv")

Organizing DEG Lists

In this final step, the DEG list we’ve obtained, condition_subset, needs to be organized better for easier analysis. We organize it in two ways. First is based on average log2FC, because increasing positive log fold change denotes up regulation in genes. Second is based on removing discrepant genes such as mitochondrial (MT) and ribosome (RP) genes using the function !grepl.

condition_subset$genes <- rownames(condition_subset)
condition_subset <- dplyr::filter(condition_subset, !grepl("MT-", condition_subset$genes))
condition_subset <- dplyr::filter(condition_subset, !grepl("RP", condition_subset$genes))
sorted_mast_cells_condition_subset <- condition_subset[order(condition_subset$avg_log2FC, decreasing = TRUE), ]

head(sorted_mast_cells_condition_subset)

Saving and Viewing an Organized DEG Table:

After organizing through average log2FC and discrepant genes, we save as csv file and export to Excel for further analysis.

View(sorted_mast_cells_condition_subset)
write.csv(sorted_mast_cells_condition_subset, "sorted_mast_cells_condition_subset.csv")
---
title: "scRNA Sequencing Analysis of Mast Cells in Diabetic Pancreatic Microenvironment"
author: "Akilesh Shankar"
output:
  html_notebook: default
  pdf_document: default
Date: July 2024
editor_options:
  markdown:
    wrap: 72
---
### Loading R Packages
First, load these packages, since all of them are required to execute further steps. If you can't load them / are missing one, please use install.packages() to re install or download them, and then run the code. Note that Seurat requires specific versions to work correctly; we typically use v.4.4.0, but the latest version (5.1.0) could also work.
```{r}
suppressMessages(library(mclust))
suppressMessages(library(MatrixExtra))
suppressMessages(library(ggplot2))
suppressMessages(library(dplyr))
suppressMessages(library(sctransform))
suppressMessages(library(future))
suppressMessages(library(Seurat))
suppressMessages(library(cowplot))
suppressMessages(library(stringr))
suppressMessages(library(patchwork))
suppressMessages(library(Matrix))
suppressMessages(library(Scillus))
suppressMessages(library(MAST))
suppressMessages(library(tidyr))
suppressMessages(library(ggthemes))
suppressMessages(library(magrittr))
suppressMessages(library(SeuratData))
suppressMessages(library(SeuratObject))
suppressMessages(library(devtools))
suppressMessages(library(htmltools))
suppressMessages(library(ScaledMatrix))
suppressMessages(library(sparseMatrixStats))
suppressMessages(library(DelayedMatrixStats))
suppressMessages(library(BPCells))
suppressMessages(library(tidyr))
```


### Loading Seurat Objects 
Load the seurat object that contains the single cell data obtained from samples. A seurat object is a dataset that contains the original data of the donors, along with cell types identified, diabetic status, etc. Load the metadata to look at all information contained in this seurat object in various categories. In this case, reharm_SCT_2024Jun18_small is a downsampled version of a processed seurat object containing human pancreatic islet sample data. 
```{r}
reharm_SCT_2024Jun18_small <- readRDS("~/AS/objects/reharm_SCT_2024Jun18_small.rds")

reharm_SCT_2024Jun18_small

reharm_SCT_2024Jun18_small@meta.data
```

### Viewing Cell Types in Seurat Objects
We are interested in viewing different cell types, so we use Idents() and levels(). This shows us different cell types associated with endocrine and inflammatory processes such as alpha, beta, delta-gamma, macrophages, mast cells, etc.
```{r}
Idents(reharm_SCT_2024Jun18_small) <- "cell_type"
levels(reharm_SCT_2024Jun18_small)
```


### Deleting Cell Types Not Relevant to Seurat Objects
We view the cell types present in our large seurat object. We see that there are cell types associated with low quality, and it would be beneficial to the dataset if we removed them. To do this, we follow the code below.
```{r}
reharm_SCT_2024Jun18_small <- subset(reharm_SCT_2024Jun18_small, idents = c("LowQuality_Endocrine", "LowQuality_Endothelial", "LowQuality_Exo", "LowQuality"), invert = TRUE)
```


### Visualizing Cell Types Using Dimensional Plots.
Now that we have removed low quality cell types, we re-construct a DimPlot with an updated registry of cell types for our reference.
```{r}
DimPlot(reharm_SCT_2024Jun18_small)+ggtitle("ALL CELL TYPES")
```
### Subsetting Cells of Interest from Seurat Object
Now, we need to make a subset from this data that contains only mast cells, because we are interested in characterizing mast cells in diabetic patients. This generates a new seurat object called mast_cells_cluster. If you want to subset some other type of cells, use the same syntax but edit to include your cell type instead.
```{r}
mast_cells_cluster <- subset(reharm_SCT_2024Jun18_small, idents = "Mast")
levels(mast_cells_cluster)
```

### Seurat Clusters and How to View Them
Within mast cells, we need to view the different clusters that have been generated under this seurat object. Seurat clusters are usually named 1, 2, 3, etc. that are used to cluster the cells together for more easier organization.
```{r}
Idents(mast_cells_cluster) <- "seurat_clusters"
levels(mast_cells_cluster)
```

### Single Cell RNA Sequencing of Seurat Clusters
Now, it's time to prepare this data for scRNA seq analysis. This is done through running a PCA analysis and then running a UMAP analysis to obtain a UMAP plot (can also use TSNE, as they both test linear dimensionality to produce a DimPlot). Then, we use FindNeighbors to compute a shared nearest neighbors (SNN) graph, which helps in clustering.
```{r}
mast_cells_cluster <- RunPCA(mast_cells_cluster, dims = 1:30, verbose = FALSE) %>%
  RunTSNE(dims = 1:30, verbose = FALSE) %>%
  FindNeighbors(verbose = TRUE) %>%
  FindClusters(resolution = 0.1, verbose = TRUE)
```


### Saving the Analyzed Cells as a new Seurat Object
We have now obtained a seurat object with our required scRNA seq parameters. Save this object using saveRDS().
```{r}
saveRDS(mast_cells_cluster, "mast_cells_cluster.rds")
```



### Visualizing This Object as a TSNE Dimensional Plot
Here's a TSNE plot to exhibit our clustering results:
```{r}
TSNEPlot(mast_cells_cluster, group.by = "seurat_clusters")
```


And a TSNE plot to show the same, grouped by heterogenous donors, and conditions:
```{r}
TSNEPlot(mast_cells_cluster, group.by = "Donor")
```


```{r}
TSNEPlot(mast_cells_cluster, group.by = "Condition")
```


### Finding Markers for Differential Expressed Genes
Now we need to find marker genes for this processed mast cell cluster object under the 3 clusters identified. Obtaining these marker genes will help us downstream in seeing which genes are expressed when and why. They are called differential expressed genes (DEG).
```{r, echo=TRUE, results='hide'}
Idents(mast_cells_cluster) <- "seurat_clusters"
mast_cell_markers <-  Seurat::FindAllMarkers(mast_cells_cluster, only.pos = FALSE, test.use = "roc")
```


Save markers as table.
```{r}
write.csv(mast_cell_markers, "mast_cell_markers_clean.csv")
head(mast_cell_markers_clean)
```



### Classifying DEG Markers Based on Diabetic Conditions:
Change Idents() to look at differential expressed genes by condition, because we want to view the marker genes we obtained before based on Lean condition (which is non-diabetic), and T2D condition. We ignore Obese group because it contains both diabetic and non-diabetic samples, which has shown to produce discrepancy in results.
```{r}
Idents(mast_cells_cluster) <- "Condition"
levels(mast_cells_cluster)
```



### Visualizing Such Markers through TSNE Dimensional Plots
Let's now view the analysis we've done using TSNEPlot. We first obtain a general plot of all conditions for reference. You can also do this using UMAPPlot or DimPlot.
```{r}
TSNEPlot(mast_cells_cluster,group.by = "Condition")
```



### Subsetting Markers into Diabetic Conditions of Interest
We further subset the seurat object mast_cells_cluster based on the conditions we want, which are lean and T2D. We then reprocess condition_subset into clusters for accurate visualization.
```{r}
mast_cells_cluster@graphs <- list()
condition_subset <- subset(mast_cells_cluster, idents = c("Lean", "T2D"))
condition_subset <- RunPCA(condition_subset, dims = 1:30, verbose = FALSE) %>%
  RunTSNE(dims = 1:30, verbose = FALSE) %>%
  FindNeighbors(verbose = TRUE) %>%
  FindClusters(resolution = 0.1, verbose = TRUE)
TSNEPlot(condition_subset)+ggtitle("Lean and T2D Conditions")
```

### Finding DEGs Based on these Specific Diabetic Conditions
Find DEGs between Lean and T2D conditions.
```{r, echo=TRUE, results='hide'}
condition_subset <- Seurat::FindAllMarkers(condition_subset, only.pos = FALSE, test.use = "roc")
```

This can be saved both as a CSV for later use on Excel.
```{r}
write.csv(condition_subset, "condition_subset.csv")
```


### Organizing DEG Lists
In this final step, the DEG list we've obtained, condition_subset, needs to be organized better for easier analysis. We organize it in two ways. First is based on average log2FC, because increasing positive log fold change denotes up regulation in genes. Second is based on removing discrepant genes such as mitochondrial (MT) and ribosome (RP) genes using the function !grepl.
```{r}
condition_subset$genes <- rownames(condition_subset)
condition_subset <- dplyr::filter(condition_subset, !grepl("MT-", condition_subset$genes))
condition_subset <- dplyr::filter(condition_subset, !grepl("RP", condition_subset$genes))
sorted_mast_cells_condition_subset <- condition_subset[order(condition_subset$avg_log2FC, decreasing = TRUE), ]

head(sorted_mast_cells_condition_subset)
```

### Saving and Viewing an Organized DEG Table:
After organizing through average log2FC and discrepant genes, we save as csv file and export to Excel for further analysis.
```{r}
View(sorted_mast_cells_condition_subset)
write.csv(sorted_mast_cells_condition_subset, "sorted_mast_cells_condition_subset.csv")
```
